**********************************
*	Title: merge04.do
*	Date: 17 Sept 2006
*	Author: Zoe McLaren
*	Description: 
*		1. Group observations by household and reshape data to have one obs per hhold 
*		2. Drop 15 obs because no data in them, and mismatched recno-id within vpid.
*		3. Save new version to be merged with the individual record data (indiv04_18sept.do)
*		4. Appends cleaned adultyouth, child and youngchild data sets
*		5. Merges individual data with VP data, and checks quality of merge master
*		6. Characterizes the non-matched observations
*		7. Robustness checks on blank households (then we probably want to drop them)
**********************************


*Put VP data onto one line per hh
do "$syntax/vp04.do"

*Clean indiv data (including child & youngchild).  Append cleaned indiv data set.
do "$syntax/indiv04.do"

*Replace missing vpno with 99999 so that hhid can be created
*vpno missing: 25 in adult, 6 in child have missing vpno and 26 in child have vpno<0
replace vpno=99999 if (vpno==. | vpno<0)


*Merge indiv data with dataset of reshaped (wide) VP data
sort eanumber vpno
merge (eanumber vpno) using "$data/vp04.dta"
label define merge 1 "only in individual" 2 "only in hh data" 3 "in both hh and indiv"
label var _merge merge
*Check quality of merge
tab _merge 
forvalues i = 1/3 {
	tab _merge if group==`i'
}

*Save merged file (including unmatched)
sort eanumber vpno
egen hhid = group(eanumber vpno)
codebook hhid   /*everyone in same hh will have same VP data*/

*Only in hh: most of the unmatched have no (almost no) information in them
egen checkvp = robs(q1-q12)
tab checkvp if _merge==2, mi	

drop if checkvp==0 & _merge==2

label data "HSRC 2004 matched vp-indiv data, merged $_S_DATE by ZM"
save "$data/hhindiv04.dta", replace

exit
